在这项工作中,我们提出了一种新颖的方法,用于对训练有素的神经网络学习。特别是,我们根据层的传输函数形成Bregman的差异,并通过合并平均向量并将主方向归一化,并构造原始Bregman PCA公式的扩展,并将主方向归一化,相对于围绕平均值的局部凸功能的几何形状。这种概括允许将学习的表示形式导出为具有非线性的固定层。作为知识蒸馏的应用,我们为学生网络的学习问题提出了预测教师表示的压缩系数,这些内容被作为输入到导入层的输入。我们的经验发现表明,与使用教师的倒数第二层表示和软标签相比,与典型的教师培训相比,我们的方法在网络之间传输信息更为有效。
translated by 谷歌翻译
对于工业规模的广告系统,对广告点击率(CTR)的预测是一个核心问题。广告点击构成了一类重要的用户参与,通常用作广告对用户有用的主要信号。此外,在每次点击收费的广告系统中,单击费用期望值直接输入价值估计。因此,对于大多数互联网广告公司而言,CTR模型开发是一项重大投资。此类问题的工程需要许多适合在线学习的机器学习(ML)技术,这些技术远远超出了传统的准确性改进,尤其是有关效率,可重复性,校准,信用归因。我们介绍了Google搜索广告CTR模型中部署的实用技术的案例研究。本文提供了一项行业案例研究,该研究强调了当前的ML研究的重要领域,并说明了如何评估有影响力的新ML方法并在大型工业环境中有用。
translated by 谷歌翻译
变压器模型最近已成为自然语言处理中的基础模型之一,作为副产品,最近对扩展这些模型具有重大的兴趣和投资。但是,这些大型变压器语言模型的培训和推理成本令人难以置信,因此需要更多的研究来识别更有效的变体。在这项工作中,我们通过用统计语言建模中的文献启发的变压器体系结构提出了一个简单而有效的修改,该架构是通过通过文本序列的离散潜在表示构建的n-grams来增强模型的。我们评估了我们的模型,关于C4数据集的语言建模的N-Strammer以及Superglue数据集的文本分类,并发现它的表现优于诸如变压器和底漆等几个强基线。我们为JAX中的可重复性目的开放源模型。
translated by 谷歌翻译
在实现最先进的性能和在实际应用中负担得起的大型模型之间,计算机视觉的差异越来越大。在本文中,我们解决了这个问题,并显着弥合了这两种模型之间的差距。在我们的实证研究中,我们不一定要提出一种新方法,而是要努力确定一个可靠的有效食谱,以使最先进的大型模型在实践中负担得起。我们证明,当正确执行时,知识蒸馏可以成为减少大型尺寸而不损害其性能的强大工具。特别是,我们发现存在某些隐式设计选择,这可能会严重影响蒸馏的有效性。我们的关键贡献是对这些设计选择的明确识别,这些选择以前在文献中尚未阐明。我们通过一项全面的实证研究备份了我们的发现,在广泛的视觉数据集上展示了令人信服的结果,尤其是获得了最先进的Imagenet Resnet-50模型,该模型可实现82.8%的Top-1准确性。 。
translated by 谷歌翻译
Generalized linear models with nonlinear feature transformations are widely used for large-scale regression and classification problems with sparse inputs. Memorization of feature interactions through a wide set of cross-product feature transformations are effective and interpretable, while generalization requires more feature engineering effort. With less feature engineering, deep neural networks can generalize better to unseen feature combinations through low-dimensional dense embeddings learned for the sparse features. However, deep neural networks with embeddings can over-generalize and recommend less relevant items when the user-item interactions are sparse and high-rank. In this paper, we present Wide & Deep learning-jointly trained wide linear models and deep neural networks-to combine the benefits of memorization and generalization for recommender systems. We productionized and evaluated the system on Google Play, a commercial mobile app store with over one billion active users and over one million apps. Online experiment results show that Wide & Deep significantly increased app acquisitions compared with wide-only and deep-only models. We have also open-sourced our implementation in TensorFlow.
translated by 谷歌翻译
Embedding words in vector space is a fundamental first step in state-of-the-art natural language processing (NLP). Typical NLP solutions employ pre-defined vector representations to improve generalization by co-locating similar words in vector space. For instance, Word2Vec is a self-supervised predictive model that captures the context of words using a neural network. Similarly, GLoVe is a popular unsupervised model incorporating corpus-wide word co-occurrence statistics. Such word embedding has significantly boosted important NLP tasks, including sentiment analysis, document classification, and machine translation. However, the embeddings are dense floating-point vectors, making them expensive to compute and difficult to interpret. In this paper, we instead propose to represent the semantics of words with a few defining words that are related using propositional logic. To produce such logical embeddings, we introduce a Tsetlin Machine-based autoencoder that learns logical clauses self-supervised. The clauses consist of contextual words like "black," "cup," and "hot" to define other words like "coffee," thus being human-understandable. We evaluate our embedding approach on several intrinsic and extrinsic benchmarks, outperforming GLoVe on six classification tasks. Furthermore, we investigate the interpretability of our embedding using the logical representations acquired during training. We also visualize word clusters in vector space, demonstrating how our logical embedding co-locate similar words.
translated by 谷歌翻译
Deep learning techniques with neural networks have been used effectively in computational fluid dynamics (CFD) to obtain solutions to nonlinear differential equations. This paper presents a physics-informed neural network (PINN) approach to solve the Blasius function. This method eliminates the process of changing the non-linear differential equation to an initial value problem. Also, it tackles the convergence issue arising in the conventional series solution. It is seen that this method produces results that are at par with the numerical and conventional methods. The solution is extended to the negative axis to show that PINNs capture the singularity of the function at $\eta=-5.69$
translated by 谷歌翻译
Agile robotics presents a difficult challenge with robots moving at high speeds requiring precise and low-latency sensing and control. Creating agile motion that accomplishes the task at hand while being safe to execute is a key requirement for agile robots to gain human trust. This requires designing new approaches that are flexible and maintain knowledge over world constraints. In this paper, we consider the problem of building a flexible and adaptive controller for a challenging agile mobile manipulation task of hitting ground strokes on a wheelchair tennis robot. We propose and evaluate an extension to work done on learning striking behaviors using a probabilistic movement primitive (ProMP) framework by (1) demonstrating the safe execution of learned primitives on an agile mobile manipulator setup, and (2) proposing an online primitive refinement procedure that utilizes evaluative feedback from humans on the executed trajectories.
translated by 谷歌翻译
When testing conditions differ from those represented in training data, so-called out-of-distribution (OOD) inputs can mar the reliability of black-box learned components in the modern robot autonomy stack. Therefore, coping with OOD data is an important challenge on the path towards trustworthy learning-enabled open-world autonomy. In this paper, we aim to demystify the topic of OOD data and its associated challenges in the context of data-driven robotic systems, drawing connections to emerging paradigms in the ML community that study the effect of OOD data on learned models in isolation. We argue that as roboticists, we should reason about the overall system-level competence of a robot as it performs tasks in OOD conditions. We highlight key research questions around this system-level view of OOD problems to guide future research toward safe and reliable learning-enabled autonomy.
translated by 谷歌翻译
Recently, deep networks have shown impressive performance for the segmentation of cardiac Magnetic Resonance Imaging (MRI) images. However, their achievement is proving slow to transition to widespread use in medical clinics because of robustness issues leading to low trust of clinicians to their results. Predicting run-time quality of segmentation masks can be useful to warn clinicians against poor results. Despite its importance, there are few studies on this problem. To address this gap, we propose a quality control method based on the agreement across decoders of a multi-view network, TMS-Net, measured by the cosine similarity. The network takes three view inputs resliced from the same 3D image along different axes. Different from previous multi-view networks, TMS-Net has a single encoder and three decoders, leading to better noise robustness, segmentation performance and run-time quality estimation in our experiments on the segmentation of the left atrium on STACOM 2013 and STACOM 2018 challenge datasets. We also present a way to generate poor segmentation masks by using noisy images generated with engineered noise and Rician noise to simulate undertraining, high anisotropy and poor imaging settings problems. Our run-time quality estimation method show a good classification of poor and good quality segmentation masks with an AUC reaching to 0.97 on STACOM 2018. We believe that TMS-Net and our run-time quality estimation method has a high potential to increase the thrust of clinicians to automatic image analysis tools.
translated by 谷歌翻译